Parallel Execution

In analogy to the begin-end block used to group statements to be executed sequentially, there is the parallel-endparallel block for synchronous parallel statements. The semantics hereby is that every processing element executes the same statement with its own local values of the variables involved, thus obtaining individual results. So the phrase parallel mustn't lead to an erroneous interpretation: all statements contained in this block are executed sequentially under central control, but they are performed data-parallel. Together with a unique control flow this just reflects the single instructionJP multiple data (SIMD) machine model.

Because of the SIMD restriction, each PE may execute the current instruction or remain idle; a concurrent execution of different instructions is not possible. There are two ways of selecting PEs for parallel execution in Parallaxis which may also be combined:

A)
by explicitly stating their network position at the entrance of a parallel block for each dimension specified e.g., for a one-dimensional structure:

		PARALLEL [22..44]
		  <parallel statements>
		ENDPARALLEL
B)
by using if-, case-selections or while-, repeat-loops with vector conditions e.g.:
		PARALLEL
		  IF <vector expression> THEN <parallel statements> END
		ENDPARALLEL

In case A, only the PEs within the selected range execute the statements inside the parallel block, all others remain idle during that time. PE selections may be constant or variable positional expressions, such as subrange, enumeration, and set. One selection is required for every dimension.

In case B, a conditional expression determines which PEs will execute the statements of the then-branch and which will remain idle. While the branching-condition evaluates to true for one processor, it might be false for others. The condition is evaluated for each PE individually in parallel and only those PEs for which the condition evaluates to true will execute the then- branch; all other PEs remain idle. If there exists an else-branch, it will be executed subsequently with the inverse PE group. The two branches of an if-selection cannot be executed in parallel because of the previously mentioned ''single control flow'' SIMD restriction. Therefore, they have to be serialized. Processors that execute the then-part may continue while processors executing the else-part are blocked with their identifications pushed onto a global stack at the controlling host. They remain inactive while the host supervises execution of the then-part. Afterwards, when executing the else-part, the processor sets change places, that is, the ''then- processors'' now become inactive for some time while the ''else-processors'' are active. The use of a dynamic stack also accounts for nested if-statements. Each stack-entry corresponds to a nesting-level of if-statements. Since serialization degrades the performance of a parallel system, the user should keep this fact in mind when designing a parallel algorithm.

The semantics of a parallel loop is analogous: only those PEs satisfying the loop-condition execute the loop-statements. A loop can only be terminated when none of the active PEs satisfy the loop-condition. As long as a single PE remains, the loop is being continued while all PEs excluded by the loop-condition are idle. This means, the controlling host has to get a feedback from the network whether there are PEs remaining or not. The implementation of this feedback depends on the hardware facilities of the target system. In any case, the OR-reduction of the condition-vector (see also section 4.3) returns this information to the host.